How to contribute
============

We welcome contributions to the Network Datasets repository! This page provides guidelines for contributing datasets, code improvements, and documentation.

Types of Contributions
----------------------

We accept several types of contributions:

* **New datasets**: Infrastructure network datasets following our format
* **Code improvements**: Bug fixes, new features, performance optimizations
* **Documentation**: Improvements to existing docs, new tutorials
* **Testing**: Additional test cases, validation improvements
* **Examples**: New Jupyter notebooks, usage examples

Getting Started
---------------

1. **Fork the repository** on GitHub
2. **Clone your fork** locally:

   .. code-block:: bash

      git clone https://github.com/your-username/network-datasets.git
      cd network-datasets

3. **Create a development environment**:

   .. code-block:: bash

      conda create -n network-datasets-dev python=3.9
      conda activate network-datasets-dev
      pip install -e ".[dev]"

4. **Install pre-commit hooks** (optional but recommended):

   .. code-block:: bash

      pip install pre-commit
      pre-commit install

Adding New Datasets
-------------------

Dataset Structure
~~~~~~~~~~~~~~~~~

New datasets should follow this directory structure:

.. code-block:: text

   dataset-name/
   ├── dataset.yaml          # Dataset metadata
   └── v1/                   # Version directory
       ├── data/             # Data files
       │   ├── nodes.json    # Node definitions
       │   ├── edges.json    # Edge definitions
       │   └── probs.json    # Probability data
       ├── docs/             # Documentation
       │   ├── README.md     # Dataset description
       │   ├── PROVENANCE.md # Data source information
       │   └── CHANGELOG.md  # Version history
       └── scripts/          # Analysis scripts (optional)
           └── example.ipynb

Required Files
~~~~~~~~~~~~~~

**dataset.yaml**
   Dataset metadata file with the following structure:

   .. code-block:: yaml

      name: dataset-name
      version: 1.0.0
      title: Human-readable title
      license: CC-BY-4.0
      description: >
        Detailed description of the dataset including:
        - What type of infrastructure network
        - Number of nodes and edges
        - Data source and methodology
        - Use cases and applications
      contacts:
        - name: Your Name
          affiliation: Your Institution
          email: your.email@example.com
      tags: [power, transportation, water, etc.]
      files:
        nodes: data/nodes.json
        edges: data/edges.json
        probs: data/probs.json
      citation: |
        Citation information for the dataset

**nodes.json**
   Node definitions following the JSON schema:

   .. code-block:: json

      {
        "node_id": {
          "x": 0.0,
          "y": 0.0,
          "type": "optional_type",
          "additional_attributes": "optional"
        }
      }

**edges.json**
   Edge definitions following the JSON schema:

   .. code-block:: json

      {
        "edge_id": {
          "from": "node1",
          "to": "node2",
          "directed": false,
          "additional_attributes": "optional"
        }
      }

**probs.json**
   Probability data for edge failures:

   .. code-block:: json

      {
        "edge_id": {
          "1": {"p": 0.95},
          "0": {"p": 0.05}
        }
      }

Data Quality Guidelines
~~~~~~~~~~~~~~~~~~~~~~~

* **Coordinates**: Use consistent units (e.g., kilometers) and coordinate system
* **Node IDs**: Use descriptive, unique identifiers
* **Edge IDs**: Use descriptive, unique identifiers
* **Attributes**: Include relevant metadata (capacity, type, etc.)
* **Probabilities**: Ensure probabilities sum to 1.0 for each edge
* **Validation**: All data must pass schema validation

Dataset Documentation
~~~~~~~~~~~~~~~~~~~~~

Create comprehensive documentation for your dataset:

**README.md**
   Include:
   * Dataset overview and purpose
   * Data source and methodology
   * Network statistics (nodes, edges, connectivity)
   * Usage examples
   * Citation information

**PROVENANCE.md**
   Include:
   * Original data source
   * Processing steps and transformations
   * Assumptions and limitations
   * Data quality notes

**CHANGELOG.md**
   Track changes and updates to the dataset.

Validation
~~~~~~~~~~

Before submitting, validate your dataset:

.. code-block:: bash

   # Validate all datasets
   python data_validate.py --root .

   # Validate specific dataset
   python data_validate.py --root . --dataset your-dataset-name

Update Registry
~~~~~~~~~~~~~~~

Add your dataset to the ``registry.json`` file:

.. code-block:: json

   [
     {
       "name": "your-dataset-name",
       "version": "1.0.0",
       "path": "your-dataset-name/v1",
       "summary": "Brief description of your dataset",
       "license": "CC-BY-4.0"
     }
   ]

Code Contributions
------------------

Code Style
~~~~~~~~~~

* Follow PEP 8 style guidelines
* Use type hints for function parameters and return values
* Write docstrings for all public functions
* Use meaningful variable and function names

Testing
~~~~~~~

* Write tests for new functionality
* Ensure all existing tests pass
* Aim for good test coverage

.. code-block:: bash

   # Run tests
   pytest tests/

   # Run with coverage
   pytest --cov=ndtools tests/

Documentation
~~~~~~~~~~~~~

* Update docstrings for modified functions
* Add examples to the documentation
* Update the API reference if needed

Pull Request Process
--------------------

1. **Create a feature branch**:

   .. code-block:: bash

      git checkout -b feature/your-feature-name

2. **Make your changes** and commit them:

   .. code-block:: bash

      git add .
      git commit -m "Add your dataset: brief description"

3. **Push to your fork**:

   .. code-block:: bash

      git push origin feature/your-feature-name

4. **Create a pull request** on GitHub with:
   * Clear description of changes
   * Reference to any related issues
   * Screenshots for UI changes
   * Test results

Pull Request Guidelines
~~~~~~~~~~~~~~~~~~~~~~~

* Keep PRs focused on a single feature or dataset
* Write clear, descriptive commit messages
* Respond to review feedback promptly
* Update documentation as needed
* Ensure all tests pass

Review Process
--------------

All contributions are reviewed by maintainers:

* **Code quality**: Style, functionality, tests
* **Data quality**: Validation, documentation, format compliance
* **Documentation**: Clarity, completeness, accuracy
* **Testing**: Coverage, correctness

Reviewers may request changes before merging.

License
-------

By contributing to this project, you agree that your contributions will be licensed under the same licenses as the project:

* **Code**: MIT License
* **Data**: CC-BY-4.0 License

This means your contributions can be used by others under these terms.

Getting Help
------------

If you need help with contributing:

* **Open an issue** on GitHub for questions
* **Check existing issues** for similar questions
* **Read the documentation** thoroughly
* **Ask in discussions** for general questions

Recognition
-----------

Contributors will be recognized in:

* The project's README.md file
* Release notes for significant contributions
* The project's documentation

Thank you for contributing to the Network Datasets project!